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Thorndike and Hagen, 1 Ahniann and 
Glock, 2 Nunn ally, 5 and others have noted 
the limitations of commercially published 
achievement tests when they are used in 
determining the extent to which curricular 
goals peculiar to a given school, school 
district, or geographic region have been 
achieved. The dilemma of the test pub- 
lisher, who must produce a subject matter 
test general enough to cover diverse cur- 
ricular goals but specific enough to achieve 
individual sales, has its counterpart with the 
teacher or supervisor, who must choose 
between a well-constructed and field-tested 
instrument with limited content validity for 
his particular course, and the technically less 
sound instrument that he or his department 
might develop to meet specific local needs. 

There are unit tests in chemistry available 
commercially, of course, but with the excep- 
tion of one series designed for use with ad- 
vanced-level college students/ these instru- 
ments are intended to be used in connection 
with a particular textbook. No one instru- 
ment among, them, therefore, could be ap- 
propriate for different schools using different 
texts, and considering the structure of the 
New York State Regents Syllabus, it seems 
unlikely that any single series could ade- 
quately represent the required content of 
high school chemistry, as most teachers of 
that subject in New York view it. Since the 
standard Regents Examination in any course 



is an important consideration to teachers 
throughout the State, moreover, the practice 
of constructing unit tests for local use, using 
old Regents tests, is quite common; it is 
expected that practice with llegents-type 
items will enhance pupils’ prospects of suc- 
cess with the final examination. 

Two major limitations face the teacher 
who is sincerely interested in developing a 
series of sound, subject matter tests with 
specifications consistent, with his own partic- 
ular course objectives. The writing of good 
test items, first of all, has been properly 
called an art” . . . demanding the utmost 
degree of creativity, ingenuity, and per- 
sistence on the part of anyone who practices 
it,” 5 and to these requirements — most 
practitioners would allow — should be added 
a sizable quantity of time; few individuals 
in teaching will find themselves, simultane- 
ously, in possession of all four. 

The project described here attempted to 
overcome these limitations by bringing to- 
gether a group of teachers to provide a pool 
of time, creativity, ingenuity, and persistence 
with a group of professors from two neigh- 
boring universities, to provide technical 
assistance. Twelve teachers of high school 
chemistry and four university professors 
participated throughout the two years of the 
study (1961-1963); three additional chem- 
istry teachers joined the effort during the 
second year. Preliminary field testing of the 
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PERIODIC TABLE TEST SCORE 




Fig. 1. Regression equation with halogen elements test score plotted against periodic table test score. 
0.7!) = periodic table test score; 1 = halogen elements test score; 15.38 = constant = chemistry grade 
(+ OR - 9.75). 



seven tests was accomplished during the 
1963—19(54 school year. 

Objective 

The objective of the project was to pro- 
duce an instrument that would contribute 
significantly to the prediction of success in 
high-school chemistry courses following the 
syllabus approved by the Regents of the 
University of the State of New York, and to 
develop a series of unit tests that would 
measure accomplishment with respect to 
selected areas of the Regents Syllabus. The 
unit tests were intended to provide check- 
point data for the evaluation of individual 
and group progress, and to permit analysis of 
the contributions made to total accomplish- 
ment by higher order mental processes, as 
well as by recall. Both the public school 
and the university personnel participating in 
the study were agreed that, the Regents 



Examination in Chemistry (the standard 
final examination for pupils enrolled in 
courses following the Regents Syllabus) 
placed more emphasis on the recall process 
than could be justified in terms of such 
frequently mentioned goals of chemistry 
inst-uction as development of the abilities to 
reason scientifically, to think critically, and 
to solve problems efficiently. 

Bloom and his co-workers had already 
recognized this need, of course, and had 
attempted to contribute to its fulfillment 
through their Taxonomy of Educational 
Objectives . 6 So far jis measurement in 
science education is concerned, however, the 
only use to which the writers of this paper 
could find the Taxonomy having been put — 
at least in the first five years following its 
publication — was in connection with the 
compilation of a folio of test items. 7 View- 
ing the Taxonomy as the best available guide 
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for the measurement of cognitive competen- 
cies other than recall, the group agreed at its 
first meeting to attempt to incorporate test 
items representative of Bloom’s categories in 
each of the unit tests. 

Procedures in Developing the Pretest and 
Unit Tests 

At the initial meeting of the group, six 
units of study from the Regents Syllabus 
were agreed upon as progress checkpoints 
that would be appropriate in terms of the 
teaching patterns followed by all of the 
participating teachers. The teachers then 
submitted to the consultants tests and test 
items — mostly from their files of old tests— 
related to these selected study units, to pro- 
vide item pools for the tests. Concurrently, 
and independently, each teacher responded 
to a questionnaire prepared by the con- 
sultants. indicating the relative emphasis 
given to discrete topics within each of the 
selected study units. Responses to the 
questionnaire were averaged to provide the 
tables of specifications for the unit tests, the 
teachers having agreed in advance to this 
procedure. 

The test items supplied by the teachers 
were edited by the consultants and were 
converted, where necessary, to multiple 
choice questions. Approximately one hun- 
dred items were prepared for each of the se- 
lected study units, with the content distribu- 
tion following, as closely as possible, that of 
the appropriate table of specifications. 
Particular care was taken to include items 
which could be rated according to the several 
cognitive competencies of the Taxonomy. 

It was necessary to make several compro- 
mises in adapting the Taxonomy for use in 
this project. The considerable detail of the 
Taxonomy could not be fully exploited in unit 
tests which were to be capable of administra- 
tion within a single, 4o- minute class session, 
and so it was decided that only the major 
toxonomical subdivisions should be em- 
ployed. Secondly, it was found (to no one’s 
surprise) that items requiring analysis, syn- 
thesis. or evaluation are quite difficult to 



construct in the multiple choice form that 
had been agreed upon. The categorical 
definitions, finally, are relatively cumber- 
some and a somewhat simpler classification 
scheme was decided upon. Accordingl 3 r , the 
following four categories were established for 
the purposes of the project: 

Recall: Any item which had been taught 
in substantially the same form as that in 
which it appeared in the test, requiring mere 
resurrection of a particular bit of informa- 
tion. Rephrasing, inversion of sentences, 
and similar form changes do not remove an 
item from the recall category. 

Comprehension: Any item requiring the 
application of a principle under circum- 
stances different from those constituting the 
teaching context of the principle, but in such 
form that the correct principle is implied in 
the question. 

Application: Similar to comprehension, 

but the required principle is not implied in 
the question, so that the student must select 
the appropriate principle from his repertoire 
of learned principles, as well as apply it cor- 
rectly. Quantitative problems were con- 
sidered in this category. 

Higher Competencies: This category in- 
cluded items which required analysis of a 
complex situation and the subsequent draw- 
ing of analytic, synthetic, or evaluative 
inferences. 

Individual test, items were categorized 
according to the above classification scheme 
by the science education consultants, and 
these independent judgments were found to 
be identical with respect to a substantial 
majority of the items. In cases of disagree- 
ment, an attempt was made to reconcile 
differences of opinion through conference, 
and failing this, the item in question was 
discarded. ' While the writers felt that the 
degree of judgmental correspondence 
achieved was adequate to sustain the opera- 
tional adequacy of the system, they also con- 
cluded that the taxonomical categories other 
than recall, at least, as they were employed in 
this study, are not without ambiguity. 
Differentiation of recall-type items from 
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TABLE I 

Distribution of Item Types among the Unit Tests 







Item type 




Unit Test 


Re- 

call 


Com- 

pre- 

hen- 

sion 


Appli- 

cation 


Higher 

order 


Periodic table 


15 


12 


10 


13 


Water, solutions, 


and ionization 


17 


15 


12 


6 


The halogen ele- 


ments 


15 


18 


8 


9 


Sulfur and nitrogen 


22 


10 


8 


10 


Principles of or- 


ganie chemistry 


14 


30 


5 


1 


Principles of chem- 


ical reactions 


12 


14 


18 


6 


Totals 


95 


99 


61 


45 



other types, on the other hand, proved to be 
a simple matter. 

Two preliminary forms of each unit test 
were then constructed and distributed to the 
participating teachers for class administra- 
tion at a time considered to be appropriate 
by the teacher in terms of individual class 
progress. Each of the preliminary forms 
consisted of fifty items, and upon the com- 
pletion of all administrations of the tests per- 
taining lo a particular unit, item analyses 
were performed to determine irem difficulty, 
discriminating efficiency, and the presence of 
noneffective foils. At a general meeting of 
the cooperating group following the comple- 
tion of preliminary trial of each unit test, test 
items which had proved to be confusing to 
pupils or to present other similar problems 
were eliminated from consideration. Items 
finally selected for inclusion in each of the 
unit tests had difficulty indices ranging from 
0.35 to 0.80 (proportion passing), had foils of 
demonstrated effectiveness, and discrimi- 
nated significantly on the basis of local test 
score. In its final form, each unit test con- 
sisted of fifty items, distributed according to 
item type as shown in Table I. 

In developing the chemistry pretest, the 
primary goal was production of an instru- 
ment that would permit efficient forecasting 
of Regents Examination results. In con- 



sideration of this goal, content analysis of 
immediate past Regents Exams was em- 
ployed to construct a table of specifications, 
and items were almost entirely of the recall 
type, dealing with elements of chemical 
knowledge that the experienced teachers felt 
might reasonably be expected to have been 
presented to beginning chemistry students at 
some time in their prior schooling. From 
trial forms administered in September, 1962, 
a final form consisting of forty multiple 
choice items was produced, following essen- 
tially the same procedures as were used in 
developing the unit tests. 

Results of Field Testing 

During the 1963-64 school year, the pre- 
test and unit tests were used by all of the 
participating teachers in their regular classes. 
To insure a usual level of motivation on the 
part of pupils taking the tests, the unit tests 
were used as regular, periodic tests of normal 
influence in determining grades. From a 
beginning population of about 1200 pupils 
enrolled in chemistry, complete data — in- 
cluding scores on the Chemistry Regents 
Examination — were obtained for 801, so that 
results obtained from these data should 
probably be considered as applicable only to 
rather healthy pupils (i.c., who are not ab- 
sent from school for one or more tests) who 
do not move from one school district to 
another, and who actually finish the chemis- 
try course. 

Table II presents the intercorrelations 
among the unit tests and the chemistry 
pretest. Ranging from 0.45 to 0.78, with a 
median value of 0.69, it is considered that the 
tests are about as independent as the com- 
mon thread of chemistry running through 
them would permit. Among the six unit 
tests, the range is 0.61 to 0.78, reflecting their 
greater similarity to one another than to the 
practically exclusively vecall-type items of 
the pretest; the correlations of pretest scores 
with those on the unit tests, as might be 
expected, are the lowest in the matrix. 

Since approximately half of the variance in 
scores for most of the pairs of unit tests 
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TABLE II 

Intercorrelations among tests, A' = 801 



Test 2 


3 


4 


5 


6 


7 


1. Chemistry pretest. 0.55 


0.59 


0.53 


0.45 


0.51 


0 51 


2. Periodic table test — 


0.71 


0.72 


0.61 


0.69 


0.70 


3. Water, solutions, and ionization test 


— 


0.76 


0.65 


0.69 


0.74 


4. The halogen elements test 




— 


0.69 


0.73 


0.78 


5. The sulfur-nitrogen test 






— 


0.72 


0.68 


6. Organic chemistry test 








— 


0.75 


7. Principles of chemical reactions test 










— 



appeared to be shared variance, the prospects 
for efficient prediction of Regents Examina- 
tion scores through multivariate analyses 
appeared favorable. Multiple regression 
analyses following the June, 1964, Regents 
Examination proved this to be the case, at 
least so far as the employment of two pre- 
dictor variaWes was concerned; addition of 
predictors beyond two, however, did not 
yield correlations of sufficiently increased 
magnitude to justify the additional compu- 
tational complexity. Multiple correlation 
coefficients ranged from 0.75 with the chem- 
istry pretest and periodic table test as 
predicting instruments to 0.84, using the unit 
tests covering “principles of organic chemis- 
try” and “principles of chemical reactions.” 

Whatever opinion one may hold regarding 
the statewide examining system in New York 
(and opinions are diverse), there is no deny- 
ing the fact that performance of pupils on 
Regents Examinations is of considerable 
importance to teachers, counselors, parents, 
and the pupils themselves. For this reason, 
it is of local import to be able to make pre- 
dictions regarding Regents scores, hopefully 
in the case of undesirable predictions, so that 
special steps may be taken to alter the pre- 
dicted outcome. To facilitate the predic- 
tion process, the consultants prepared a 
series of nomographs depicting the two- 
predictor regression equations in graphic 
form,* and example of which is shown in 
Figure 1. Use of the nomographs by 
teachers and counselors is readily apparent, 

* The nomographs were produced by a CalComp 
• r )70 digital plotter (California Computer Products) 
from data prepared by an IBM 7074 System. 



but it may well be, also, that they can be of 
direct help to students by enabling them to 
relate their individual progress to date to the 
Regents scores earned in the past by stu- 
dents with similar progress characteristics. 
The fact that the June, 1964 Chemistry 
Regents Examination was generally con- 
ceded to be of greater- than-usual difficulty 
should be advantageous in this regard. 

Following the completion of all unit test- 
ing, four scores were derived for each student 
who had taken all six tests, representing the 
total number of items answered correctly on 
all tests for each of the four item categories. 
The intercorrelations among these scores are 
presented in Table III and will immediately 
be noted to be rather substantial for scores 
that presume to represent quite distinct 
intellectual competencies. 

Considering the difficulty that was en- 
countered in differentiating among items 
calling for abilities other than recall, it would 
not have been surprising to find high correla- 
tions among the three other categories. But 
the correlations between recall-type items 
and the other categories are equally substan- 
tial, indicating that a common factor is 
influential in determining performance on all 
four types of items. It may be, of course, 

TABLE III 



Intercorrelations among Item Types, A r = 801 



Item type 


2 3 


4 


1. Recall 


0.87 0.84 


C 81 


2. Comprehension 


— 0.84 


0.80 


3. Application 


— 


0.82 


4. Analysis, synthesis, 






and evaluation 




— 
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that this factor is nothing more than recall 
(or memory), itself, which is an essential 
component of adecjuate functioning in any 
cognitive area. Other possible explanatory 
constructs would include general scholastic 
aptitude (or specific chemistry aptitude) that 
largely determines progress in all cognitive 
areas, or some combination of these. 

Whatever the explanation, it is apparent 
to the individuals who have worked on this 
project that the practical application of the 
Taxonomy of Educational Objectives, in such a 
way as to produce meaningful results, is no 
easy task. With the data available to this 
investigation, no further conclusive explana- 
tion of the high correlations can be offered, 
but the finding does raise a definite question 
regarding the utility of the Taxonomy in test 
construction, particularly where the indi- 
vidual teacher depends upon his lone efforts 
for his own test. 

Conclusions 

As a means of dealing with the problems of 
developing achievement tests directed 
toward specific local requirements, the 
cooperative approach may certainly be rec- 
ommended. Aside from the requirement for 
statistical demonstrations of test adequacy, 
the acid test of an achievement examination 
is teacher opinion, frequently influenced by 
pupil opinion, and in this respect, the tests 
developed as described here have passed. 
The tests are now being used as an integral 
part of the chemistry courses in the schools 
which participated in the study; the teachers 
like them, and it really is of little import that 
a large measure of this liking may be the 
result of their participation in the process of 
test development. The in service training 
value of the experience cannot be evaluated 
objectively, but it almost certainly is con- 
siderable. 

On the other side of the coin, there is some 
question in the minds of the writers regard- 
ing the optimal size for a group effort such as 
this. It is probably not necessary to have as 
large a group as the one that worked on this 



project, and increased numbers multiply the 
problems of communication, meetings, unity 
of objectives, etc. Similarly, with regard 
to technical consultants to test development 
projects, there would probably be some 
advantage to having them affiliated with the 
same university, and it should not usually be 
necessary to have four of them. 

Finally, note should be taken of the symbi- 
otic properties of a project such as this one. 
Except in the largest school districts, tech- 
nical specialists in test development are not 
available within the school structure; nor 
does one profess for long in the field of educa- 
tion in a university without feeling the need 
for more recent and intimate contact with 
the everyday problems and activities of 
public schools. Cooperative projects like 
this one can be mutually satisfying in these 
respects, and regardless of the ultimate value 
of being able to predict achievement (Re- 
gents or otherwise), if such predictions me 
going to be made, then they should be as 
efficient and accurate as possible, thus, if not 
benefiting the pupils, at least harming them 
less. 

The project reported here was supported by t lie 
Office of Education Research, New York State Ed- 
ucation Department, and the fourteen cooperating 
school districts. 

References 

1. Thorndike, R. L., and Elizabeth Hagen, Meas- 
urement and Evaluation in Psychology and Educa- 
tion, Wiley, New York, 1961, p. 289. 

2. Ahmann, J. S., and M. 1). Clock. Evaluating 
Pupil Growth, Allyn and Bacon, Boston, 1959, 
p. 350. 

3. Nunnally, J. C., Tests and Measurements, 
McGraw-Hill, New York, 1959, pp. 270f. 

4. Degering, E. F., et, al., Cooperative Objective 
Unit Tests in Organic Chemistry, 10. F. Degering, 
Natick, Mass., 1950. 

5. Ahmann, J. S., and M. D. Clock, Evaluating 
Pupil Growth, Allyn and Bacon, Boston, 1959, 
p. 187. 

6. Bloom, B. S., Ed., Taxonomy of Educational 
Objectives, Part I: The Cognitive Domain , David 
McKay, New York, 1956. 

7. Dressell, P. L., and D. H. Nelson. Questions 






I 

i 

I 








LOCALLY ORIENTED CHEMISTRY ACHIEVEMENT TESTS 91 



and Problems in Science, Folio 1, Educational J. A. Schmitt, A Multi-district Study of Achievement 
Testing Service, Princeton, 1956. in New York Stale Regents Chemistry When Taught 

S. Gullikson, H., Theory of Mental Tests, Wiley, in Conventional and Large-group Classes, The Educa- 
New Yoik, 1950. tional Research Center, State University of New 

9. Winter, S. S., S. 1). Farr, J. J. Montean, and York, Buffalo, New York, 1965. 




JOURNAL OF RESEARCH IN SCIENCE TEACHING 






VOL. 4, PP. 92-94 ( 1906) 



Hiding behind Course Titles* 
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According to Brand weiit,' among the 
major teaching skills which are rated excel- 
lent by their principals and colleagues are 
those skills useful in confronting students 
with interesting objects and events, in 
questioning them, and in leading discus- 
sion to sharpen their ability for designing 
a critical investigation. This description 
not only comes close to indicating the major 
desirable characteristics of science teachers 
today but also for tomorrow. 

Introduction 

In one’s early days of teaching science, 
the opinion is usually held that brilliance 
coupled with knowledge of subject matter, 
is the only prerequisite to becoming an 
effective science teacher, a teacher with the 
characteristics just described. Experience 
and research indicate otherwise . 2 - 3 

What to some seems to be an easy solu- 
tion to ineffective science teaching, truly, 
is not so easy. Today, however, we do 
have some results of reputable research 
to refer to for answers. I suggest we continue 
to pursue this research and begin to use the 
findings in developing a pattern for under- 
graduate experiences in the training of 
future secondaiy school science teachers. 
We should not continue to make the mis- 
take of believing the answer to developing 
effective teachers lies solely in a list of pre- 
scribed courses! 

Instead, it seems reasonable in years to 
come to establish a means of selecting from 
the population those people who we can 
predict with reasonable certainty will be- 
come effective science teachers, and then 



outline the experiences (not course titles) 
that will develop to the fullest possible 
extent in these people behaviors so essential 
to a productive, successful, and happy 
science teaching career. 

After this, perhaps we might spell out the 
major environmental factors of the under- 
graduate schools and the first few years of 
teaching that will nurture and continue 
to develop good science teaching. And 
at the same time it might be well to pre- 
pare an alternate list of factors that destroy 
the will and the desire to mature in the com- 
plex art and science of science teaching. 

Except for science content and the kind 
of practical information usually obtained 
in the undergraduate science methods 
course, our problem appears to be little 
different than that of many groups at- 
tempting to develop effective undergraduate 
training programs for teachers, since re- 
search findings indicate that superior quali- 
ties in teachers are common to many aca- 
demic fields . 4 

Role of Content Courses 

If, as already implied, the behavior pat- 
terns we seek can be developed through 
appropriate experiences in maity sequences 
of content courses (and perhaps this is true 
as well for professional courses), we then 
should leave to the colleges the responsibilitj' 
of deciding the nature and order of courses 
to be included in the undergraduate science 

* Excerpted from an introductory paper presented 
at the Eastern Region Association for the Education 
of T>achers in Science meeting, Keene, New Hamp- 
shire, May 8, 1965. 
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